-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java API Proposal #9137
Java API Proposal #9137
Conversation
++ on the simplicity of the API and auto-discovery of plugins! |
*/ | ||
public final class DiscoverPlugins { | ||
|
||
public static void main(final String... args) throws NoSuchMethodException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on isolated class loaders per plugin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelandis I think I'd stay away from that for the time being (meaning, for while JRuby is around probably ...). Though I think this is something we can work towards once we got rid of JRuby types and the Event
only contain standard JDK types (but as long as Event
holds RubyString
etc. that introduces a lot of potentially complex constraints for the JRuby version we have to enforce plugins and I'm not even sure it's easily possible in principle since JRuby sets up its own classloader for the JRuby classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment of introduction of alot of complexity. However, if using a security managed LS is in the near future, it may be worth tackling that complexity sooner then later since its usually easier to bake things like that in rather then layer them in them later. Also if security managed LS is in the future, then maybe Java9 modules provides some wins there without isolated class loading (just speculation, I haven't researched).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a security managed LS is that close. I'd rather bite off that complexity a bit later.
A codec is not what you describe. Codecs today operate on more than just streams. For example, the Redis input provides a fixed-size payload for each LPOP with no |
for (final Class<?> cls : annotated) { | ||
System.out.println(cls.getName()); | ||
System.out.println(((LogstashPlugin) cls.getAnnotations()[0]).name()); | ||
final Constructor<?> ctor = cls.getConstructor(LsConfiguration.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small handful of things we want each plugin to have beyond it's configuration.
- Access to the DLQ
- Access to the scoped metric
- Settings (maybe)
- Logger (maybe)
- Possibly x-cutting things in the future, like an abstracted TLS implementation or common JDBC abstractions.
What are your thoughts on handling pushing of things other then Configuration to the plugins ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelandis good point! We should have a second ctor argument that holds references to all this stuff ( I'd call it LogstashContext
I guess). Will add it soon :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a "sketch" of this
Yea, see it as a metaphor for now. I guess the best 1:1 Java equivalent of what the current API gets as an input is |
EDIT I don't think this approach will work. After speaking with @jordansissel it doesn't map well to the file input, where we need stream identity on top of the input stream in a non-blocking fashion. @jordansissel @original-brownbear Thinking about codecs, I think the 'real' signature is: So, the |
@original-brownbear (pulled from inline comment... thanks Github) How does core know what is valid / not valid configuration and type ? For example, i configure: mutate { How does Logstash know to error ? Does this require an external file to describe the valid configuration ? Also, where would deprecation warnings, obsoleted warnings, and validation happen ? If it is an external file, can you provide an example of what that may look like ? |
I believe this validation happens now in each plugin's |
I agree with this strategy for custom level validation. But for common validation, such as valid configuration options, type/file/other common validation, and deprecate/obsolete messaging we are pushing a lot to the author of the plugin vs. the ruby way which is already handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent effort here. I've left a number of questions / concerns inline.
WRT codecs, I think we should leave them out of V1 of this proposal. Not every input needs them, and we can add them later.
*/ | ||
public interface Filter extends AutoCloseable { | ||
|
||
QueueReader filter(QueueReader reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QueueReader
isn't always a queue, obviously it could just be a previous filter. What about PipelineReader
as a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc I have no opinion here tbh. We can go with PipelineReader
:) Or maybe just EventIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the latter, EventIterator
sounds good.
|
||
QueueReader filter(QueueReader reader); | ||
|
||
void flush(boolean isShutdown); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support flushing in the future, can we remove this from V1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc :D I'm always happen to remove flush
why do you even ask :) Will remove it
@Override | ||
public long poll(final Event event) { | ||
final long seq = reader.poll(event); | ||
if (seq > -1L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the sequence be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-cf259dc78531f006beca5e143b529554R13 ... -1
indicates a failure to poll an event (think interrupts, timeouts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course :)
public QueueReader filter(final QueueReader reader) { | ||
return new QueueReader() { | ||
@Override | ||
public long poll(final Event event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really do want the ability for filtering to be done batch-wise. This is the only way to efficiently do things like Elasticsearch lookups. We kind of getaway with not doing this with the JDBC filters, but that's only possible due to caching. Even with that, we can't exploit the ability to search multiple values in a single query for the cache seeding portion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this API let you create new events? I understand we can drop events with Event.cancel
, how would the split
filter work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc the API doesn't prevent you from first collecting x
Event
s before actually doing anything to them in a batched way and then returning another Reader that wraps batched results :) That's the beauty here imo ... if your plugin needs batches you can do batching, if it doesn't it can just save a bunch of memory by going one by one.
Also, there's nothing preventing you from creating new events here. Simply overwrite the given Event
with the new event, don't increment the sequence number and don't call poll
on the QueueReader
and you created a new Event
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see! While I do like that I feel like we should implement higher level default implementations here for real-world plugins. The problem with patterns is that it's an extra place for bugs and code duplication. Previously each output would do its own batching and there'd be occasional bugs and differences between them.
Additionally, the batch size is set at the logstash level. We don't expose that option to the plugin, and each plugin would have to know what that size is to batch as the user is expecting it to. A default implementation could do that correctly.
Can we add the ability for plugin authors to instead define different callbacks as shown below? I think keeping the low-level API is fine, but I don't think it should be the default.
Something like...
//for the common case of writing a filter that takes one event and modifies (or cancels) an event
public void filterSingle(Event event);
//For the common case of filters that take a batch
public void filterBatch(Collection<Event>);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WRT creating new events, I realize that's relatively rare, but I think some sugar would be nice. Implementing the split filter would be awkward as is, because you need to extract stuff out of the event, then queue it waiting for poll
to be invoked again.
Where do these Event
objects come from in this API? Are they from an Event pool? Would there be a way to simply ask for a new event from the pool instead without waiting for a new poll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear so, in this example, I can get a new event by invoking what? It'd help me to see a short version of the clone
filter written out here if you have the chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc I think this is how I'd do clone
here:
private Event clone;
private long lastSeq = -1L;
@Override
public long poll(final Event event) {
if (clone != null) {
event.overwrite(clone);
clone = null;
return lastSeq;
}
final long seq = reader.poll(event);
lastSeq = seq;
if (seq > -1L) {
clone = event.clone();
}
return seq;
}
Just don't poll
upstream if there is a clone
to return and don't increment the sequence number since it's still the result of the last "real" input event that was poll
ed from upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After meeting with @original-brownbear we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads. These instances will not be invoked concurrently to prevent people from shooting themselves in the foot. There will be an annotation for opting out of this, much like the concurrency :shared
option we have today.
We won't tackle sugaring the clone/split
filter use case initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc - can you clarify ?
we decided to use a filter instance per pipeline pattern as we do today. This means that there will be one instance used across pipeline worker threads.
Seem to be saying opposite things... do you mean one instance per pipeline worker thread, or do you really mean a single instance across all pipeline worker threads ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One instance across all threads, just as we do today. One instance per pipeline, shared across all worker threads.
} | ||
|
||
@Override | ||
public void flush(final boolean isShutdown) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to include flush since it's an API we'll be deprecating soon.
*/ | ||
public Mutate(final LsConfiguration configuration, final LsContext context) { | ||
this.field = configuration.getString("ls.plugin.mutate.field"); | ||
this.value = configuration.getString("ls.plugin.mutate.value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the boilerplate of ls.plugin.mutate.*
, why can't we just have configuration.getString("value")
.
Can you go into a bit more detail as to how LsConfiguration
would work? Is there a way to declare: "field must be a string" as you can today? This is important for driving a visual builder in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc you're right the prefixes aren't necessary :) WIll remove them :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc I'd suggest that a plugin has to implement a method that returns Map<String, LsConfigTypeInfo> configSchema()
that describes each key's expected value (type, deprecated etc.).
Then we can use the return of that for a visual builder + also write validation logic using this map into the ConfigCompiler
:) I'll add some code illustration for that approach soon :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine for the scope here.
In the future we should improve this to drive the visual builder. It'd be great to support a nested description as well. I'd love @ycombinator 's input here. What do we need to drive a pipeline builder UI? We should create a new issue for that discussion however. I think parity with the existing API is good for v1 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a perfect world (likely not v1) the interface of a plugin (properties, their data types, any additional validation rules, short descriptive documentation, etc.) could be described in a language-agnostic file. That file could then be used (perhaps after further "compilation") by LS core as well as the LS UI. I'm deliberately leaving out the specifics of "used".
Alternatively, such an interface definition file could be generated from one of these classes as well, if that's possible.
The crux of the problem I'm trying to solve here is sharing of definitions between core and UI so we don't have to duplicate them in both places, keep them in sync, etc.
Of course, we shouldn't let this concern weigh down v1 unless solving for it later from the v1 design is going to be difficult.
try { | ||
while (!stopped && inpt.hasNext()) { | ||
final String message = inpt.next(); | ||
writer.push(Collections.singletonMap("message", message)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you talk me through why the writer writes Map
objects, not Event
objects? Was that more efficient given that a Map
is more lightweight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc yea that was just an idea for optimizing things a little more. It's kinda annoying that we have all these Event
that don't hold any state. For filters and outputs I can see the value in having Event
around for serialization and such, but for Input
s it's just a waste of memory to already create Event
when all inputs in fact simply create Map<String, Object>
.
* Pushes a multiple events to the Queue, blocking for the given timeout if the Queue is not | ||
* ready for a write. | ||
* Guarantees that a return {@code != -1} means that all events were pushed to the Queue | ||
* successfully and no partial writes of only a subset of the input events will ever occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really can't make this guarantee with E2E ACKs, partial writes may happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc I was under the impression that we want to fix that fact and force batch wise writes to the PQ to fix stuff like the HTTP input timing out?
To me it seems this is simply something to fix in the Queue implementation(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with @original-brownbear we decided not to guarantee partial writes .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, we'll just remove this initially.
8e34ab6
to
cf44a52
Compare
Added an outline of how we could do config validation + exposing the config schema in cf44a52. IMO we could just do something like that, where plugins are forced to return a list of config specs and then have the validation happen in the plugin factory. |
* @param millis Timeout in millis | ||
* @return Sequence number of the first event or -1 if push failed | ||
*/ | ||
long push(Collection<Map<String, Object>> events, long millis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke with @original-brownbear , we'll remove this in the MVP.
@andrewvc alright, timeout and batched writes removed for now as discussed. What else can I do here to move this along so we can get going on implementing on top of it? :) |
Thanks, I think this pattern will work, but it is a bit verbose.
Have you considered an annotation + reflection based approach ?
I have a working example over at https://github.com/jakelandis/skunkstash/blob/master/processors/src/main/java/org/logstash/skunk/plugin/processors/Translate.java#L19 |
@jakelandis how would you deal with nested hash-like structures as in the http poller? I think annotations are the easiest way to deal with most things. We'd also need to add an JSON Schema + Jackson would be an easy way to solve this from an implementation standpoint, but its pretty ugly for plugin authors. |
Not really tbh. I really hate that one for two reasons:
Again (as in the |
We should try to show examples of configuring: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html using this new API? |
0f3ea63
to
df9bb6c
Compare
@andrewvc see https://github.com/elastic/logstash/pull/9137/files#diff-63d7bea0e5acbcf6cb19e590783f202cR13 and https://github.com/elastic/logstash/pull/9137/files#diff-7336563974b577c2998327d7bbb4581bR74 :) I didn't fully implement both plugins now, but I think I have a sufficient number of cases covered to see the way it works. |
@andrewvc @original-brownbear Following-up to our discussion yesterday, here's what the UI would like to know about each plugin, in some sort of structured format (preferably JSON-based):
I have no opinion on the exact format of such a spec. As long as all the information above is encoded in the spec, the UI can work with it. |
@ycombinator that sounds very doable :) With the exception of the value constraints, we can easily serialize the current |
That's awesome, @original-brownbear! Lets not worry about the value constrains for v1 then. We can add those in later and enhance the UI at the time as well. Thanks! |
@original-brownbear I want to amend my statement in the previous comment :) Lets not worry about the range value constraint. It would be very nice to have enumerations expressed in the JSON spec. This would let the UI do things like show a drop down menu for the options, rather than having the user type something and make a mistake. |
@andrewvc I over-rode the Ideally I would love to have a similar mechanism on the Java stuff. |
private static final PluginConfigSpec<Path> CA_CERT_CONFIG = | ||
LsConfiguration.pathSetting("cacert"); | ||
|
||
private static final PluginConfigSpec<Map<String, String>> URLS_CONFIG = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really escapsulate the urls setting in the poller. The value can be either a String
, as it is here, or a complex hash: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html . That was my reason for us trying to figure out that case here.
My suspicion is that we'll wind up with Map<String, Object>
, and use code to sort out the mess.
I have no idea how to make this programmatic for a UI.
@ycombinator , your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading https://www.elastic.co/guide/en/logstash/current/plugins-inputs-http_poller.html#plugins-inputs-http_poller-urls, it looks like this is always expected to be a hash? The Map<String, String>
type in the code on this line seems to mirror that expectation too. I guess I'm not seeing how this could be set as "just" a single String
?
I have no idea how to make this programmatic for a UI.
One way to encode this in a spec for the UI would be to use some sort of a special designation for user-defined keys in a hash/object data type. Let me try to elaborate below.
First, consider a setting whose datatype is a hash, but with keys provided by the plugin itself. For example, the schedule
setting. In this case, the spec for this setting could be something like the following (stealing from ES mapping syntax for illustration purposes only):
{
"schedule": {
"type": "object" // or "hash" if you prefer that
"properties": {
"cron": { "type": "string" },
"every": { "type": "string" },
"in": { "type": "string" },
"at": { "type": "string" }
}
}
}
Now, consider the urls
setting. The spec for that could be:
{
"urls": {
"type": "object" // or "hash" if you prefer that
"properties": {
"__user_defined__": {
"type": "string",
"max_count": -1 // to indicate that there could an unlimited number of URLs
}
}
}
}
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator @andrewvc my thought was that we could programmatically encapsulate what ES does as PluginConfigSpec<Map<String, PluginConfigSpec>>
and build up a nested mapping that way.
For some reason, I forgot adding that part ... will be up soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator it can be either a string or a hash. It's weird that way.
@original-brownbear ah, yes, that makes more sense.
Still, encoding this sort of thing into JSON is inherently strange. It's pretty complex to encode can be a string or a map, and even after you do encode that, how do you render the UI?
So, here's a thought, maybe there's some sort of 'override' UI spec the plugin can bundle. In this case, we don't need the UI to support the simple syntax, just the full one. We could let the plugin author manually specify what the UI controls should be instead of doing that programatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc I'm good with that proposal — that the UI gets the "highest fidelity" syntax. We can always figure out optimizations/shortcuts later.
WDYT about my proposal for encoding plugin-specified keys vs. user-defined keys above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc here you go, quick sketch but the idea should be clear:
I think you can simply encode this as a nesting of LsConfiguration
. This way the UI can also render it as such with the ability to add and remove elements (that are essentially the same as the top-level element).
In this case, you could simply add a UI element that allows adding nested "rows" for each Map<String, LsConfiguration>
typed setting.
Hopefully, this makes sense, happy to go into more detail, but I'm optimistic we can properly encode this for the UI :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator I tried to capture your proposal in some Java code.
I actually left out the spot with the flat hash I just noticed, my bad:
@SuppressWarnings("unchecked")
public static <T> PluginConfigSpec<Map<String, T>> requiredFlatHashSetting(
final String name, Class<T> type) {
//TODO: enforce subtype
return new PluginConfigSpec(
name, Map.class, null, false, true
);
}
@SuppressWarnings("unchecked")
public static PluginConfigSpec<Map<String, LsConfiguration>> requiredNestedHashSetting(
final String name, final Collection<PluginConfigSpec<?>> spec) {
return new PluginConfigSpec(
name, Map.class, null, false, true, spec
);
}
Something like this would be doable though that captures flat sub-hashes :)
I think we can easily render that in that or a similar ES style JSON for the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear I have to admit, I'm a bit lost :) But that's okay! I mostly just want to know, from a UI POV, if we'll be able to encode the following:
-
A setting whose data type is a hash. Almost 100% certain this is possible :)
-
For such settings, there are two cases for the hash's keys:
a. Plugin-defined, like in the case of theschedule
setting setting, and
b. User-defined, like in the case of theurls
settingThe UI needs to be able to distinguish between these two cases by looking at the spec so it can decide whether to a) show "static" keys or b) make the key a text box field for the user to type something in, and also allow them to add N number of such rows.
So my question is, can this sort of distinction be generated from the Java API you are proposing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator yes, that's doable and was exactly the plan in the Java snippet above :)
@andrewvc what do you have in mind for next steps here? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some further feedback.
} | ||
|
||
@Override | ||
public long poll(final Event event, final long millis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking through this, most people will need to write the same polling code over and over again.
I think we should create some simple default implementations here that people can use. I put two up here: https://github.com/original-brownbear/logstash/pull/1/files
WDYT? Batching is necessary for things like the ES filter, etc. It's pretty complicated to implement in terms of this API (I'm sure there's bugs in my code up there for instance).
I'd rather we promote those two filter APIs (and make similar ones for the Output class) than have people writing this boilerplate over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batching is necessary for things like the ES filter, etc
Yea maybe, but the efficient way of doing it is also highly plugin dependent. As of right now, I think it's fair to say that there are no plugins out there that actively make use of the batched interface in a quantifiable way. Otherwise, I wouldn't have found #8428 by mere chance :)
=> I'm not really comfortable with defining a batched API like that (yet). I'd much rather build a few plugins and see what works in the real world than guessing the right approach here now tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it's plugin dependent? The only reason filters don't batch today is that it isn't possible because they can only get one event at a time. I'd rather not box us in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, how does the efficiency of https://github.com/original-brownbear/logstash/pull/1/files#diff-eb733fe2c6eb811410b0e0a88226263d , it should be garbage free, aside from a single new Event[] call per batch, which is pretty cheap given a reasonable batch size.
Also, looking forward to your feedback on the SimpleFilter
API.
I'd rather we only expose these APIs than the low level one here. The low level one is so implementation dependent it boxes us in. I'm also not sure who would actually want to use them directly, they're much more finnicky, and I'm not sure how much they buy a programmer in any situation I can think of.
|
||
private static final PluginConfigSpec<Map<String, LsConfiguration>> URLS_CONFIG = | ||
LsConfiguration.requiredHashSetting( | ||
"urls", Arrays.asList(URL_METHOD_CONFIG, USER_CONFIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this encapsulates an HTTP poller configuration. urls
can either look like:
urls = {"elastic" => "http://elastic.co"}
or
urls = {"elastic" => {"method" => "get", "url" => "http://elastic.co", "headers" => {...}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too flexible for Java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc hah you're right :) The "or" case that allows for two different schemas isn't covered by my proposal.
Tbh. ... I'm tempted to vote for not allowing that in Java plugins. The only downside to making that simplification is, that we'd have to adjust the config schema for the HTTP poller if we fully port it to Java (and the HTTP poller probably isn't the only case). Other than that I only see upsides to simplifying this (especially when thinking about the UI!) ...
Think it's feasible to break with the flexibility of the Ruby implementation here for the sake of type safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I never responded here. At any rate, yes, I think it's fine to break with Ruby here.
One further thought, since the plugin objects have state, I take it we will have to instantiate one plugin per worker thread? That's tricky, but OK so long as there's some way to share state across instances of the plugin with the same ID across threads. I think that means we'll need each plugin to be able to share some sort of context object. I think there should be some sort of |
@guyboertje can you take a look at the validators here and let us know what could be improved in this proposal based on your work there? |
@guyboertje ping :) I'd like to be able to continue here and that part will be quite a bit of code still. Better to speak up before I start moving on this if possible :) |
It seems weird to me that you ask me to spend hours to fully understand what you are proposing and whether it does or does not suit the current custom validation that I cited when it would take you 30 minutes to read the current custom validation and reach the same conclusion. If after you look at the existing code and you say "we got that covered" then I'm good with that. |
just to give this some structure I'll start linking the dependent PRs that I'm opening/merging in that'll get us the necessary compiler logic for the Java plugins :) for now, waiting on #9320 |
closing in favour of continuing in #9342 :) |
@andrewvc @danhermann @yaauie
Full suggestion for an API with:
InputStream
toIterator<Event>
or better yetIterator<Map<String, Object>>
imo